Efficient Calculation of the Gauss-Newton Approximation of the Hessian Matrix in Neural Networks
نویسندگان
چکیده
The Levenberg-Marquardt (LM) learning algorithm is a popular algorithm for training neural networks; however, for large neural networks, it becomes prohibitively expensive in terms of running time and memory requirements. The most time-critical step of the algorithm is the calculation of the Gauss-Newton matrix, which is formed by multiplying two large Jacobian matrices together. We propose a method that uses backpropagation to reduce the time of this matrix-matrix multiplication. This reduces the overall asymptotic running time of the LM algorithm by a factor of the order of the number of output nodes in the neural network.
منابع مشابه
Using an Efficient Penalty Method for Solving Linear Least Square Problem with Nonlinear Constraints
In this paper, we use a penalty method for solving the linear least squares problem with nonlinear constraints. In each iteration of penalty methods for solving the problem, the calculation of projected Hessian matrix is required. Given that the objective function is linear least squares, projected Hessian matrix of the penalty function consists of two parts that the exact amount of a part of i...
متن کاملBlock-diagonal Hessian-free Optimization for Training Neural Networks
Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are rarely applied to deep learning in practice because of high computational cost and the need for model-dependent algorithmic variations. We introduce a variant of ...
متن کاملPreconditioning for Hessian-Free Optimization
Recently Martens adapted the Hessian-free optimization method for the training of deep neural networks. One key aspect of this approach is that the Hessian is never computed explicitly, instead the Conjugate Gradient(CG) Algorithm is used to compute the new search direction by applying only matrix-vector products of the Hessian with arbitrary vectors. This can be done efficiently using a varian...
متن کاملBlock-diagonal Hessian-free Optimization
Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are rarely applied to deep learning in practice because of high computational cost and the need for model-dependent algorithmic variations. We introduce a variant of ...
متن کاملA New Load-Flow Method in Distribution Networks based on an Approximation Voltage-Dependent Load model in Extensive Presence of Distributed Generation Sources
Power-flow (PF) solution is a basic and powerful tool in power system analysis. Distribution networks (DNs), compared to transmission systems, have many fundamental distinctions that cause the conventional PF to be ineffective on these networks. This paper presents a new fast and efficient PF method which provides all different models of Distributed Generations (DGs) and their operational modes...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Neural computation
دوره 24 3 شماره
صفحات -
تاریخ انتشار 2012